-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: adding data specific p-value filters #788
Conversation
@@ -37,6 +37,10 @@ def __init__( | |||
finngen_susie_finemapping_cs_summary_files=finngen_susie_finemapping_cs_summary_files, | |||
) | |||
|
|||
finngen_finemapping_df = finngen_finemapping_df.validate_lead_pvalue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor stylistic comment, and it is absolutely my preference, but I like to use as few variables as possible:
(
# Reading Finngen finemapped dataset and convert it to study locus:
FinnGenFinemapping.from_finngen_susie_finemapping(
spark=session.spark,
finngen_susie_finemapping_snp_files=finngen_susie_finemapping_snp_files,
finngen_susie_finemapping_cs_summary_files=finngen_susie_finemapping_cs_summary_files,
)
# Flagging sub-significnat loci:
.validate_lead_pvalue(
pvalue_cutoff=FinngenFinemappingConfig().finngen_finemapping_lead_pvalue_threshold
)
# Write the output.
.df.write.mode(session.write_mode).parquet(
finngen_finemapping_out
)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please carefully check, I removed the variables but not sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might missing something, but could not found the flagging of the in-house finemapped datasets. Also you mentioned there would be specific p-value cutoff for UKBPPP.
@@ -169,6 +175,7 @@ class FinngenFinemappingConfig(StepConfig): | |||
_target_: str = ( | |||
"gentropy.finngen_finemapping_ingestion.FinnGenFinemappingIngestionStep" | |||
) | |||
finngen_finemapping_lead_pvalue_threshold: float = 1e-5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one thing I'm not sure about. You have added finngen_finemapping_lead_pvalue_threshold
to the relevant config, and refer to as pvalue_cutoff=FinngenFinemappingConfig().finngen_finemapping_lead_pvalue_threshold
in the step. However, finngen_finemapping_lead_pvalue_threshold
it not an argument for FinnGenFinemappingIngestionStep
. Is it OK? Would all parameters in the config passed to the step? Would that cause any problem? @project-defiant , what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, double-checked with @project-defiant and @d0choa and all parameters in the setepConfig classes needs to be parameters in the init function of the step.
src/gentropy/eqtl_catalogue.py
Outdated
EqtlCatalogueFinemapping.from_susie_results(processed_susie_df) | ||
# Flagging sub-significnat loci: | ||
.validate_lead_pvalue( | ||
pvalue_cutoff=EqtlCatalogueConfig().eqtl_lead_pvalue_threshold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment: EqtlCatalogueConfig().eqtl_lead_pvalue_threshold
has to be passed as parameter of the step.
) | ||
# Flagging sub-significnat loci: | ||
.validate_lead_pvalue( | ||
pvalue_cutoff=FinngenFinemappingConfig().finngen_finemapping_lead_pvalue_threshold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment: FinngenFinemappingConfig().finngen_finemapping_lead_pvalue_threshold
has to be an init parameter of the step.
.filter_credible_set(credible_interval=CredibleInterval.IS99) | ||
# Flagging sub-significnat loci: | ||
.validate_lead_pvalue( | ||
pvalue_cutoff=WindowBasedClumpingStepConfig().gwas_significance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any parameter that are defined in the StepConfigs needs to be defined as parameters of the init methods in the respective classes otherwise the step would fail. The reason is that the step object would be initialised with an unexpected parameter.
Fixed. Please check |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR as it is now, makes it impossible to parametrise the p-value threshold of the PICS step. IF other steps are called as a stand-alone command, users can provide the custom p-value threshold. However this is not true for PICS step. The applied cutoff is defined by config (WindowBasedClumpingStepConfig().gwas_significance
), which value cannot be overridden.
I think this might be fine for now, but we should follow the pattern of other steps where all parameters inside the steps are customisable.
β¨ Context
Data specific lead p-value CS filters.
π What does this PR implement
π Missing
π¦ Before submitting
dev
branch?make test
)?poetry run pre-commit run --all-files
)?